Computational identification of transcription factor binding sites via a transcription-factor-centric clustering (TFCC) algorithm.

نویسندگان

  • Zhou Zhu
  • Yitzhak Pilpel
  • George M Church
چکیده

While microarray-based expression profiling has facilitated the use of computational methods to find potential cis-regulatory promoter elements, few current in silico approaches explicitly link regulatory motifs with the transcription factors that bind them. We have thus developed a TF-centric clustering (TFCC) algorithm that may provide such missing information through incorporation of biological knowledge about TFs. TFCC is a semi-supervised clustering algorithm which relies on the assumption that the expression profiles of some TFs may be related to those of the genes under their control. We examined this premise and found the vicinities of TFs in expression space are often enriched with the genes they regulate. So, instead of clustering genes based on the mutual similarity of their expression profiles to each other, we used TFs as seeds to group together genes whose expression patterns correlate with that of a particular TF. Then a Gibbs sampling algorithm was applied to search for shared cis-regulatory elements in promoters of clustered genes. Our working hypothesis was that if a TF-centric cluster indeed contains many targets of the seeding TF, at least one of the discovered motifs would be the site bound by the very same TF. We tested the TFCC approach on eight cell cycle and sporulation regulating TFs whose binding sites have been previously characterized in Saccharomyces cerevisiae, and correctly identified binding site motifs for half of them. In addition, we also made de novo predictions for some unknown TF binding sites.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CUBIC: Identification of Regulatory Binding Sites Through Data Clustering

Transcription factor binding sites are short fragments in the upstream regions of genes, to which transcription factors bind to regulate the transcription of genes into mRNA. Computational identification of transcription factor binding sites remains an unsolved challenging problem though a great amount of effort has been put into the study of this problem. We have recently developed a novel tec...

متن کامل

Mapping of Transcription Factor Binding Region of Kappa Casein (CSN3) Gene in Iranian Bacterianus and Dromedaries Camels

κ-casein is a glycosilated protein in mammalian milk that plays an essential role in the milk micelles. Control of κ-casein expression reflects this essential role, although an understanding of the mechanisms involved lags behind that of the other milk protein genes. Transcriptional regulation, a first mechanism for controlling the development of organisms, is carried out by transcription facto...

متن کامل

Mapping of Transcription Factor Binding Region of Kappa Casein (CSN3) Gene in Iranian Bacterianus and Dromedaries Camels

κ-casein is a glycosilated protein in mammalian milk that plays an essential role in the milk micelles. Control of κ-casein expression reflects this essential role, although an understanding of the mechanisms involved lags behind that of the other milk protein genes. Transcriptional regulation, a first mechanism for controlling the development of organisms, is carried out by transcription facto...

متن کامل

Genomic Promoter Analysis Predicts Functional Transcription Factor Binding

Background. The computational identification of functional transcription factor binding sites (TFBSs) remains a major challenge of computational biology. Results. We have analyzed the conserved promoter sequences for the complete set of human RefSeq genes using our conserved transcription factor binding site (CONFAC) software. CONFAC identified 16296 human-mouse ortholog gene pairs, and of thos...

متن کامل

Homocysteine Induces Heme Oxygenase-1 Expression via Transcription Factor Nrf2 Activation in HepG2 Cells

Background: Elevated level of plasma homocysteine has been related to various diseases. Patients with hyperhomocysteinemia can develop hepatic steatosis and fibrosis. We hypothesized that oxidative stress induced by homocysteine might play an important role in pathogenesis of liver injury. Also, the cellular response designed to combat oxidative stress is primarily controlled by the transcripti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of molecular biology

دوره 318 1  شماره 

صفحات  -

تاریخ انتشار 2002